Combining schema and instance information for integrating heterogeneous data sources

نویسندگان

  • Huimin Zhao
  • Sudha Ram
چکیده

Determining the correspondences among heterogeneous data sources, which is critical to integration of the data sources, is a complex and resource-consuming task that demands automated support. We propose an iterative procedure for detecting both schema-level and instance-level correspondences from heterogeneous data sources. Cluster analysis techniques are used first to identify similar schema elements (i.e., relations and attributes). Based on the identified schema-level correspondences, classification techniques are used to identify matching tuples. Statistical analysis techniques are then applied to a preliminary integrated data set to evaluate the relationships among schema elements more accurately. Improvement in schema-level correspondences triggers another iteration of an iterative procedure. We have performed empirical evaluation using real-world heterogeneous data sources and report in this paper some promising results (i.e., incremental improvement in identified correspondences) that demonstrate the utility of the proposed iterative procedure. 2006 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RiMOM2: A Flexible Ontology Matching Framework

With the development of the Linking Open Data (LOD) project, large amount of semantic datasets have been published on the Web. Due to the open and distributed nature of the Web, the published data may be heterogeneous both in the schema level and instance level. Matching the entities of different datasets is very important for integrating information from different data sources. Recently, much ...

متن کامل

Combining Heterogeneous Data Sources through Query Correspondence Assertions

1. ABSTRACT The WWW today offers free access to a wealth of heterogeneous data sources. Combining related data from different sources in a comfortable and automatic fashion is not possible. We present our approach to this problem that is based on a declarative representation of the content of heterogeneous data sources with respect to a global schema. We describe our language to express these c...

متن کامل

Automatic Methods for Integrating Biomedical Data Sources in a Mediator-Based System

The information needed by biologists and physicians for research purposes is distributed over many heterogeneous sources. Integration systems provide a single, centralized and homogeneous interface for users to query multiple information sources simultaneously. The major limitation of integration systems, including mediator-based systems, is that the tasks involved in their creation and mainten...

متن کامل

Semantic Mappings for the Integration of XML and RDF Sources

A huge amount of data on the Web may be heterogeneous with respect to syntax, schemata and semantics. For instance, XML and RDF provide two completely different paradigms for modeling Web data. In this paper, we focus on the issue of mapping representations in an ontology-based framework that aims at integrating XML and RDF sources. We propose a solution that utilizes a new mapping language cal...

متن کامل

Intersection Schemas as a Dataspace Integration Technique

This paper introduces the concept of Intersection Schemas in the field of heterogeneous data integration and dataspaces. We introduce a technique for incrementally integrating heterogeneous data sources by specifying semantic overlaps between sets of extensional schemas using bidirectional schema transformations, and automatically combining them into a global schema at each iteration of the int...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Data Knowl. Eng.

دوره 61  شماره 

صفحات  -

تاریخ انتشار 2007